Ml model drift detection using modified gan

ABSTRACT

Discussed herein are devices, systems, and methods for machine learning (ML) model drift detection. A method can include receiving machine learning (ML) data defining a number of layers of neurons, a number of neurons per each layer, and weights for each neuron of a deployed ML model, operating the deployed ML model in a modified generative adversarial network (GAN) architecture, while operating the deployed ML model, recording output of a hidden layer of the deployed ML model, determining a metric of the output, and re-deploying the deployed ML model and monitoring whether the re-deployed ML model is suffering from ML model drift based on the metric.

RELATED APPLICATION

This application claims the benefit of priority to U.S. ProvisionalPatent Application 63/270,672, titled “ML Model Drift Detection UsingModified GAN”, and filed on Oct. 22, 2021, which is incorporated hereinby reference in its entirety.

TECHNICAL FIELD

Embodiments discussed herein regard devices, systems, and methods foridentifying machine learning (ML) drift, compensating for identified MLdrift, or a combination thereof.

BACKGROUND

An ML model can require a prohibitively large amount of training datafor sufficiently accurate and robust operation. Many times, a trainingdataset is too sparse to train a ML model accurately for deployment as arobust object/target classifier. Furthermore, an additional complicationfor ML techniques includes “deterioration” of neural networks (NNs),also known as model drift. As data evolves, such as to reflect theenvironment it represents, the information required for maintaining asufficiently robust NN will evolve and expand to continued robustnesswithin the ML model. As a result, the requirement for newly evolved datawill continuously exist for the essential robustness of the ML model.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates, by way of example, a diagram of a GAN architecture.

FIG. 2 illustrates, by way of example, a diagram of an embodiment of amodified GAN architecture for quantifying ML model drift analytics.

FIG. 3 illustrates, by way of example, a block diagram of an ML systemfor model drift detection.

FIG. 4 illustrates, by way of example, a block diagram of an embodimentof a method for ML model drift detection.

FIG. 5 is a block diagram of an example of an environment including asystem for neural network training, according to an embodiment.

FIG. 6 illustrates, by way of example, a block diagram of an embodimentof a machine in the example form of a computer system within whichinstructions, for causing the machine to perform any one or more of themethodologies discussed herein, may be executed.

DETAILED DESCRIPTION

FIG. 1 illustrates, by way of example, a diagram of a GAN architecture.The GAN architecture is well known and includes a generator network 102,a discriminator network 104, and an error propagator 106. The generatornetwork 102 generates fake data 112 based on noise 108. The noise 108can be processed to generate fake data 112 that includes features thatare within the distribution of the training data of the discriminatornetwork 104. A goal of the generator network 102 is to generate fakedata 112 that is indistinguishable, to the discriminator network 104,from the real data 110. The real data 110 is actual data that isgathered and not synthetically generated. For example, the real data 110can include images generated by a camera or other optical sensor, or thelike. The real data 110 can include images, videos, audio, a text ormultimedia file, a combination thereof, or the like.

A configuration of the generator network 102 can include a projectionand reshaping operation, followed by a repetition of three consecutivelayers including a transposed convolution, batch normalization, and arectified linear unit (ReLU). The final hidden layer can perform anothertransposed convolution. The final layer can perform a hyperbolic tangentoperation.

A configuration of the discriminator network 104 can include a dropoutlayer, hidden layers that include a convolution layer and a leaky ReLU,followed by a repetition of three consecutive layers including aconvolution layer, batch normalization layer, and a leaky ReLU layer. Afinal layer of the discriminator network can include a convolution.

The discriminator network 104 receives the real data 110 and the fakedata 112 and produces data 114 indicating whether the fake data 112 isreal or fake and whether real data 110 is real or fake. Thediscriminator network 104 and the generator network 102 can be updatedbased on error determined by error propagators 106 and 122,respectively. An update 116 of the discriminator network 104 can be tobetter distinguish between the real data 110 and the fake data 112,while an update 118 of the generator network 102 can be to improve thefake data 112 so that the discriminator network 104 mistakes the fakedata 112 for the real data 110 (as indicated by fake data real/fake 120,thus the “adversarial” in generative adversarial network. Embodimentscan modify the GAN architecture and use a modified GAN architecture todetermine whether the discriminator network 104 has experienced driftafter deployment.

The GAN architecture, as shown in FIG. 1 , can help address a problem ofsparse datasets. There is often a discrepancy between data required totrain a robust, accurate ML model and the amount of training dataavailable. A transformed model of the FIG. 1 GAN architecture canprovide an ability to identify ML model drift. The ML model drift can beidentified per classification. As a result, ML model drift can becomeidentified specifically to a level of classification. The ML model driftidentification process provides clues as to a category or feature of thetraining data that was nonexistent or sparse at the time of training.Embodiments can use an architecture similar to that of the GAN, called a“modified GAN”, to simulate drift and determine whether drift exists inthe ML model.

FIG. 2 illustrates, by way of example, a diagram of an embodiment of amodified GAN architecture 200 for quantifying ML model drift. Thearchitecture 200 as illustrated includes the discriminator network 104of the GAN architecture, such as that illustrated in FIG. 1 , aftertraining. However, the architecture 200 includes no update of thediscriminator network 104, has the discriminator network 104 operatingto classify new generator data and training data 240, and includes alatent vector analyzer 232 to determine a metric 234 that is used by adrift analyzer 236 to classify drift. New generator data in the modifiedGAN architecture 200 can include fake data 112 from the generatornetwork 102 to help quantify drift of the discriminator network 104.

Drift in the ML model can be from changes that are design changes to theobject, dressing changes (aesthetic changes) to the object, or the likefor which the discriminator network 104 may or may not be trained toconsider in classification. The change can be made to the object afterdiscriminator network 104 deployment (e.g., between training thediscriminator network 104 and testing the drift using the modified GANarchitecture 200).

The discriminator network 104 can operate to generate a classification228 for the fake data 218. The discriminator network 104 as illustratedhas a deep neural network (NN) architecture with an input layer 220,hidden layers 222, 224, and an output layer 226. Existence of a hiddenlayer 222, 224 is typically not visible to an entity operating the modelafter deployment, thus it is “hidden”. However, hidden layer output 230of one or more hidden layers 222, 224 can be used to analyze whether amodel has drifted so as to benefit from additional training.

A latent vector analyzer 232 receives the hidden layer output 230 andthe class 228. The latent vector analyzer 232 determines one or moremetrics 234 based on the hidden layer output 230. The metrics 234 caninclude an average of the hidden layer output 230 per class 228. Thelatent vector analyzer 232 can determine an average for each entry, j,in the feature vectors determined by the discriminator network 104 to beassociated with class, c. Assuming three classes and ten entries in thefeature vector, thirty averages can be determined.

The metrics 234 can include an average confidence per class. The metrics234 can include a variance of the hidden layer output 230 per class 228.The latent vector analyzer can determine the variance for each entry, j,in the feature vectors determined by the discriminator network 104 to beassociated with class, c. Assuming three classes and ten entries in thefeature vector, thirty variances can be determined. These averages andvariances can then be used with deployment of the discriminator network104 to determine whether the discriminator network 104 has experienceddrift.

FIG. 3 illustrates, by way of example, a block diagram of an ML system300 for model drift detection. The ML system 300 includes thediscriminator network 104 and a drift analyzer 336. The drift analyzer336 can determine whether the discriminator network 104, in classifyingnew data 330 in its deployed environment, is suffering from model driftbased on the metrics 234. The average feature vector per class, averageconfidence per class, variance per class, a different metric, or acombination thereof can indicate whether the discriminator network 104can benefit from further training to handle the drift. If either (i) theaverage confidence for a given class is below a first specifiedthreshold, (ii) a difference between the average confidence for a givenclass is more than a threshold less than a prior average confidence forthe given class, (iii) the variance for a given class is above a secondspecified threshold, or (iv) a difference between the variance for agiven class is more than a threshold more than a prior variance for thegiven class, the drift analyzer 336 can indicate the class 228 for whichfurther training is beneficial on the output 332. If such a drift isdetected, the real data 216, fake data 218, other data that representsthe drift condition, or a combination thereof can be used to furthertrain the discriminator network 104. The discriminator network 104,after training, can then be re-deployed with confidence that it properlyhandles the new conditions of its environment.

ML models are widely applicable. Example applications include, but arenot limited to, autonomous vehicles, cyber security, medicaldiagnostics, financial applications, among many others.

FIG. 4 illustrates, by way of example, a block diagram of an embodimentof a method 400 for ML model drift detection. The method 400 asillustrated includes receiving machine learning (ML) data defining adeployed ML model, at operation 430; operating the deployed ML model ina modified generative adversarial network (GAN) architecture, atoperation 432; while operating the deployed ML model, recording outputof a hidden layer of the deployed ML model, at operation 434;determining a metric of the output, at operation 436; and re-deployingthe deployed ML model and monitoring whether the re-deployed ML model issuffering from ML model drift based on the metric, 438.

The ML data can define a number of layers of neurons, a number ofneurons per each layer, and weights for each neuron of the deployed MLmodel. The method 400 can further include training the deployed ML modelbased on input used to determine the metric, resulting in an ML modelthat does not suffer from the ML model drift. The metric can bedetermined per class that is classified by the deployed ML model. Themetric can include one or more of (i) an average confidence for a givenclass is below a first specified threshold, (ii) a difference betweenthe average confidence for a given class is more than a threshold lessthan a prior average confidence for the given class, (iii) a variancefor a given class is above a second specified threshold, or (iv) adifference between the variance for a given class is more than athreshold more than a prior variance for the given class.

The modified GAN architecture can include fake data, from a generator ofan unmodified GAN architecture that includes the discriminator, input tothe discriminator and the deployed ML model is used as thediscriminator. The modified GAN architecture can include the deployed MLmodel configured to classify the fake data independent of real data. Themodified GAN architecture includes no feedback loops configured toupdate weights of neurons of the modified GAN architecture.

AI is a field concerned with developing decision-making systems toperform cognitive tasks that have traditionally required a living actor,such as a person. NNs are computational structures that are looselymodeled on biological neurons. Generally, NNs encode information (e.g.,data or decision making) via weighted connections (e.g., synapses)between nodes (e.g., neurons). Modern NNs are foundational to many AIapplications, such as speech recognition.

Many NNs are represented as matrices of weights that correspond to themodeled connections. NNs operate by accepting data into a set of inputneurons that often have many outgoing connections to other neurons. Ateach traversal between neurons, the corresponding weight modifies theinput and is tested against a threshold at the destination neuron. Ifthe weighted value exceeds the threshold, the value is again weighted,or transformed through a nonlinear function, and transmitted to anotherneuron further down the NN graph—if the threshold is not exceeded then,generally, the value is not transmitted to a down-graph neuron and thesynaptic connection remains inactive. The process of weighting andtesting continues until an output neuron is reached; the pattern andvalues of the output neurons constituting the result of the ANNprocessing.

The correct operation of most NNs relies on accurate weights. However,NN designers do not generally know which weights will work for a givenapplication. NN designers typically choose a number of neuron layers orspecific connections between layers including circular connections. Atraining process may be used to determine appropriate weights byselecting initial weights. In some examples, the initial weights may berandomly selected. Training data is fed into the NN and results arecompared to an objective function that provides an indication of error.The error indication is a measure of how wrong the NN's result iscompared to an expected result. This error is then used to correct theweights. Over many iterations, the weights will collectively converge toencode the operational data into the NN. This process may be called anoptimization of the objective function (e.g., a cost or loss function),whereby the cost or loss is minimized.

A gradient descent technique is often used to perform the objectivefunction optimization. A gradient (e.g., partial derivative) is computedwith respect to layer parameters (e.g., aspects of the weight) toprovide a direction, and possibly a degree, of correction, but does notresult in a single correction to set the weight to a “correct” value.That is, via several iterations, the weight will move towards the“correct,” or operationally useful, value. In some implementations, theamount, or step size, of movement is fixed (e.g., the same fromiteration to iteration). Small step sizes tend to take a long time toconverge, whereas large step sizes may oscillate around the correctvalue or exhibit other undesirable behavior. Variable step sizes may beattempted to provide faster convergence without the downsides of largestep sizes.

Backpropagation is a technique whereby training data is fed forwardthrough the NN—here “forward” means that the data starts at the inputneurons and follows the directed graph of neuron connections until theoutput neurons are reached—and the objective function is appliedbackwards through the NN to correct the synapse weights. At each step inthe backpropagation process, the result of the previous step is used tocorrect a weight. Thus, the result of the output neuron correction isapplied to a neuron that connects to the output neuron, and so forthuntil the input neurons are reached. Backpropagation has become apopular technique to train a variety of NNs. Any well-known optimizationalgorithm for back propagation may be used, such as stochastic gradientdescent (SGD), Adam, etc.

FIG. 5 is a block diagram of an example of an environment including asystem for neural network training, according to an embodiment. Thesystem can aid in training of a cyber security solution according to oneor more embodiments. The system includes an artificial NN (ANN) 505 thatis trained using a processing node 510. The processing node 510 may be acentral processing unit (CPU), graphics processing unit (GPU), fieldprogrammable gate array (FPGA), digital signal processor (DSP),application specific integrated circuit (ASIC), or other processingcircuitry. In an example, multiple processing nodes may be employed totrain different layers of the ANN 505, or even different nodes 507within layers. Thus, a set of processing nodes 510 is arranged toperform the training of the ANN 505.

The set of processing nodes 510 is arranged to receive a training set515 for the ANN 505. The ANN 505 comprises a set of nodes 507 arrangedin layers (illustrated as rows of nodes 507) and a set of inter-nodeweights 508 (e.g., parameters) between nodes in the set of nodes. In anexample, the training set 515 is a subset of a complete training set.Here, the subset may enable processing nodes with limited storageresources to participate in training the ANN 505.

The training data may include multiple numerical values representativeof a domain, such as a word, symbol, other part of speech, or the like.Each value of the training or input 517 to be classified once ANN 505 istrained, is provided to a corresponding node 507 in the first layer orinput layer of ANN 505. The values propagate through the layers and arechanged by the objective function.

As noted above, the set of processing nodes is arranged to train theneural network to create a trained neural network. Once trained, datainput into the ANN will produce valid classifications 520 (e.g., theinput data 517 will be assigned into categories), for example. Thetraining performed by the set of processing nodes 507 is iterative. Inan example, each iteration of the training the neural network isperformed independently between layers of the ANN 505. Thus, twodistinct layers may be processed in parallel by different members of theset of processing nodes. In an example, different layers of the ANN 505are trained on different hardware. The members of different members ofthe set of processing nodes may be located in different packages,housings, computers, cloud-based resources, etc. In an example, eachiteration of the training is performed independently between nodes inthe set of nodes. This example is an additional parallelization wherebyindividual nodes 507 (e.g., neurons) are trained independently. In anexample, the nodes are trained on different hardware.

FIG. 6 illustrates, by way of example, a block diagram of an embodimentof a machine in the example form of a computer system 600 within whichinstructions, for causing the machine to perform any one or more of themethodologies discussed herein, may be executed. In a networkeddeployment, the machine may operate in the capacity of a server or aclient machine in server-client network environment, or as a peermachine in a peer-to-peer (or distributed) network environment. Themachine may be a personal computer (PC), a tablet PC, a set-top box(STB), a Personal Digital Assistant (PDA), a cellular telephone, a webappliance, a network router, switch or bridge, or any machine capable ofexecuting instructions (sequential or otherwise) that specify actions tobe taken by that machine. Further, while only a single machine isillustrated, the term “machine” shall also be taken to include anycollection of machines that individually or jointly execute a set (ormultiple sets) of instructions to perform any one or more of themethodologies discussed herein.

The example computer system 600 includes a processor 602 (e.g., acentral processing unit (CPU), a graphics processing unit (GPU) orboth), a main memory 604 and a static memory 606, which communicate witheach other via a bus 608. The computer system 600 may further include avideo display unit 610 (e.g., a liquid crystal display (LCD) or acathode ray tube (CRT)). The computer system 600 also includes analphanumeric input device 612 (e.g., a keyboard), a user interface (UI)navigation device 614 (e.g., a mouse), a mass storage unit 616, a signalgeneration device 618 (e.g., a speaker), a network interface device 620,and a radio 630 such as Bluetooth, WWAN, WLAN, and NFC, permitting theapplication of security controls on such protocols.

The mass storage unit 616 includes a machine-readable medium 622 onwhich is stored one or more sets of instructions and data structures(e.g., software) 624 embodying or utilized by any one or more of themethodologies or functions described herein. The instructions 624 mayalso reside, completely or at least partially, within the main memory604 and/or within the processor 602 during execution thereof by thecomputer system 600, the main memory 604 and the processor 602 alsoconstituting machine-readable media.

While the machine-readable medium 622 is shown in an example embodimentto be a single medium, the term “machine-readable medium” may include asingle medium or multiple media (e.g., a centralized or distributeddatabase, and/or associated caches and servers) that store the one ormore instructions or data structures. The term “machine-readable medium”shall also be taken to include any tangible medium that is capable ofstoring, encoding, or carrying instructions for execution by the machineand that cause the machine to perform any one or more of themethodologies of the present invention, or that is capable of storing,encoding, or carrying data structures utilized by or associated withsuch instructions. The term “machine-readable medium” shall accordinglybe taken to include, but not be limited to, solid-state memories, andoptical and magnetic media. Specific examples of machine-readable mediainclude non-volatile memory, including by way of example semiconductormemory devices, e.g., Erasable Programmable Read-Only Memory (EPROM),Electrically Erasable Programmable Read-Only Memory (EEPROM), and flashmemory devices; magnetic disks such as internal hard disks and removabledisks; magneto-optical disks; and CD-ROM and DVD-ROM disks.

The instructions 624 may further be transmitted or received over acommunications network 626 using a transmission medium. The instructions624 may be transmitted using the network interface device 620 and anyone of a number of well-known transfer protocols (e.g., HTTP). Examplesof communication networks include a local area network (“LAN”), a widearea network (“WAN”), the Internet, mobile telephone networks, Plain OldTelephone (POTS) networks, and wireless data networks (e.g., WiFi andWiMax networks). The term “transmission medium” shall be taken toinclude any intangible medium that is capable of storing, encoding, orcarrying instructions for execution by the machine, and includes digitalor analog communications signals or other intangible media to facilitatecommunication of such software.

Additional Notes and Examples

Example 1 includes a device comprising at least one memory includinginstructions stored thereon, and processing circuitry configured toexecute the instructions, the instructions, when executed, cause theprocessing circuitry to perform operations comprising receiving machinelearning (ML) data defining a number of layers of neurons, a number ofneurons per each layer, and weights for each neuron of a deployed MLmodel, operating the deployed ML model in a modified generativeadversarial network (GAN) architecture, while operating the deployed MLmodel, recording output of a hidden layer of the deployed ML model,determining a metric of the output, and re-deploying the ML model andmonitoring whether the re-deployed ML model is suffering from ML modeldrift based on the metric.

In Example 2, Example 1 can further include, wherein the operationsfurther comprise training the deployed ML model based on input used todetermine the metric, resulting in an ML model that does not suffer fromthe ML model drift.

In Example 3, at least one of Examples 1-2 can further include, whereinthe metric is determined per class that is classified by the deployed MLmodel.

In Example 4, Example 3 can further include, wherein the metric includesone or more of (i) an average confidence for a given class is below afirst specified threshold, (ii) a difference between the averageconfidence for a given class is more than a threshold less than a prioraverage confidence for the given class, (iii) a variance for a givenclass is above a second specified threshold, or (iv) a differencebetween the variance for a given class is more than a threshold morethan a prior variance for the given class.

In Example 5, at least one of Examples 1-4 can further include, whereinthe modified GAN architecture includes fake data, from a generator of anunmodified GAN architecture that includes the discriminator, input tothe discriminator.

In Example 6, Example 5 can further include, wherein the modified GANarchitecture includes the deployed ML model configured to classify thefake data independent of real data.

In Example 7, Example 6 can further include, wherein the modified GANarchitecture includes no feedback loops configured to update weights ofneurons of the modified GAN architecture.

Example 8 includes a non-transitory machine-readable medium includinginstructions that, when executed by a machine, cause the machine toperform operations of the processing circuitry of at least one ofExamples 1-7.

Example 9 includes a method including the operations of the processingcircuitry of at least one of Examples 1-7.

Although an embodiment has been described with reference to specificexample embodiments, it will be evident that various modifications andchanges may be made to these embodiments without departing from thebroader spirit and scope of the invention. Accordingly, thespecification and drawings are to be regarded in an illustrative ratherthan a restrictive sense. The accompanying drawings that form a parthereof, show by way of illustration, and not of limitation, specificembodiments in which the subject matter may be practiced. Theembodiments illustrated are described in sufficient detail to enablethose skilled in the art to practice the teachings disclosed herein.Other embodiments may be utilized and derived therefrom, such thatstructural and logical substitutions and changes may be made withoutdeparting from the scope of this disclosure. This Detailed Description,therefore, is not to be taken in a limiting sense, and the scope ofvarious embodiments is defined only by the appended claims, along withthe full range of equivalents to which such claims are entitled.

What is claimed is:
 1. A device comprising: at least one memory including instructions stored thereon; and processing circuitry configured to execute the instructions, the instructions, when executed, cause the processing circuitry to perform operations comprising: receiving machine learning (ML) data defining a number of layers of neurons, a number of neurons per each layer, and weights for each neuron of a deployed ML model; operating the deployed ML model in a modified generative adversarial network (GAN) architecture; while operating the deployed ML model, recording output of a hidden layer of the deployed ML model; determining a metric of the output; and re-deploying the ML model and monitoring whether the re-deployed ML model is suffering from ML model drift based on the metric.
 2. The device of claim 1, wherein the operations further comprise training the deployed ML model based on input used to determine the metric, resulting in an ML model that does not suffer from the ML model drift.
 3. The device of claim 1, wherein the metric is determined per class that is classified by the deployed ML model.
 4. The device of claim 3, wherein the metric includes one or more of (i) an average confidence for a given class is below a first specified threshold, (ii) a difference between the average confidence for a given class is more than a threshold less than a prior average confidence for the given class, (iii) a variance for a given class is above a second specified threshold, or (iv) a difference between the variance for a given class is more than a threshold more than a prior variance for the given class.
 5. The device of claim 1, wherein the modified GAN architecture includes fake data, from a generator of an unmodified GAN architecture that includes the discriminator, input to the discriminator.
 6. The device of claim 5, wherein the modified GAN architecture includes the deployed ML model configured to classify the fake data independent of real data.
 7. The device of claim 6, wherein the modified GAN architecture includes no feedback loops configured to update weights of neurons of the modified GAN architecture.
 8. A non-transitory machine-readable medium including instructions that, when executed by a machine, cause the machine to perform operations comprising: receiving machine learning (ML) data defining a number of layers of neurons, a number of neurons per each layer, and weights for each neuron of a deployed ML model; operating the deployed ML model in a modified generative adversarial network (GAN) architecture; while operating the deployed ML model, recording output of a hidden layer of the deployed ML model; determining a metric of the output; and re-deploying the deployed ML model and monitoring whether the re-deployed ML model is suffering from ML model drift based on the metric.
 9. The non-transitory machine-readable medium of claim 8, wherein the operations further comprise training the deployed ML model based on input used to determine the metric, resulting in an ML model that does not suffer from the ML model drift.
 10. The non-transitory machine-readable medium of claim 8, wherein the metric is determined per class that is classified by the deployed ML model.
 11. The non-transitory machine-readable medium of claim 10, wherein the metric includes one or more of (i) an average confidence for a given class is below a first specified threshold, (ii) a difference between the average confidence for a given class is more than a threshold less than a prior average confidence for the given class, (iii) a variance for a given class is above a second specified threshold, or (iv) a difference between the variance for a given class is more than a threshold more than a prior variance for the given class.
 12. The non-transitory machine-readable medium of claim 8, wherein the modified GAN architecture includes fake data, from a generator of an unmodified GAN architecture that includes the discriminator, input to the discriminator and the deployed ML model is used as the discriminator.
 13. The non-transitory machine-readable medium of claim 12, wherein the modified GAN architecture includes the deployed ML model configured to classify the fake data independent of real data.
 14. The non-transitory machine-readable medium of claim 13, wherein the modified GAN architecture includes no feedback loops configured to update weights of neurons of the modified GAN architecture.
 15. A method comprising: receiving machine learning (ML) data defining a number of layers of neurons, a number of neurons per each layer, and weights for each neuron of a deployed ML model; operating the deployed ML model in a modified generative adversarial network (GAN) architecture; while operating the deployed ML model, recording output of a hidden layer of the deployed ML model; determining a metric of the output; and re-deploying the deployed ML model and monitoring whether the re-deployed ML model is suffering from ML model drift based on the metric.
 16. The method of claim 15, further comprising training the deployed ML model based on input used to determine the metric, resulting in an ML model that does not suffer from the ML model drift.
 17. The method of claim 15, wherein the metric is determined per class that is classified by the deployed ML model.
 18. The method of claim 17, wherein the metric includes one or more of (i) an average confidence for a given class is below a first specified threshold, (ii) a difference between the average confidence for a given class is more than a threshold less than a prior average confidence for the given class, (iii) a variance for a given class is above a second specified threshold, or (iv) a difference between the variance for a given class is more than a threshold more than a prior variance for the given class.
 19. The method of claim 15, wherein the modified GAN architecture includes fake data, from a generator of an unmodified GAN architecture that includes the discriminator, input to the discriminator and the deployed ML model is used as the discriminator.
 20. The method of claim 19, wherein the modified GAN architecture includes the deployed ML model configured to classify the fake data independent of real data. 